Skip to content

Conversation

@alaypatel07
Copy link
Contributor

@alaypatel07 alaypatel07 commented Nov 3, 2025

The perf-tests repository, hardcodes prometheus memory to 10Gig if kubelet scrapes are enabled.

This is leading to prometheus pod OOMing. Removing the kubelet scrape will mean the job will not collect driver mertics, but doing that is favorable for now to unblock the 5k test run. In the future there is a need to find a way to collect kubelet metrics without hardcoding prometheus memory to 10 gigs

Ref: #35700 (comment)

@k8s-ci-robot k8s-ci-robot added the area/config Issues or PRs related to code in /config label Nov 3, 2025
@k8s-ci-robot k8s-ci-robot requested a review from nojnhuh November 3, 2025 20:48
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/jobs sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/testing Categorizes an issue or PR as relevant to SIG Testing. wg/device-management Categorizes an issue or PR as relevant to WG Device Management. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 3, 2025
Signed-off-by: Alay Patel <alayp@nvidia.com>
@alaypatel07 alaypatel07 force-pushed the gce-kops-5k-dra-remove-kubelet-scrapes branch from e42b162 to b4a8144 Compare November 3, 2025 20:57
Copy link
Contributor

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 3, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alaypatel07, jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 5c4db91 into kubernetes:master Nov 3, 2025
6 checks passed
@k8s-ci-robot
Copy link
Contributor

@alaypatel07: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

  • key sig-scalability-periodic-dra.yaml using file config/jobs/kubernetes/sig-scalability/DRA/sig-scalability-periodic-dra.yaml
Details

In response to this:

The perf-tests repository, hardcodes prometheus memory to 10Gig if kubelet scrapes are enabled.

This is leading to prometheus pod OOMing. Removing the kubelet scrape will mean the job will not collect driver mertics, but doing that is favorable for now to unblock the 5k test run. In the future there is a need to find a way to collect kubelet metrics without hardcoding prometheus memory to 10 gigs

Ref: #35700 (comment)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@pohly pohly moved this from 🆕 New to ✅ Done in Dynamic Resource Allocation Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants